12 research outputs found

    Using object detection to extract structured content from documents

    Get PDF
    Structured content such as figures, tables, graphs, captions, and other graphical material often capture the essence of a document. Experienced readers often review the graphical material in a document first to quickly grasp the contents of the document. It is thus evident that identifying and extracting the structured content of a document, e.g., graphical components, is important in building a deeper semantic understanding of the document. Techniques presented herein automatically extract the structured content of documents. Machine-learning techniques, e.g., object detection, computer vision, etc., are used to recognize and extract the structured content. The techniques work well regardless of the tool used to create the document. For example, the document can be a PDF file, captured via screenshot, generated by a computer-aided design tool, etc. The techniques work across fields of study, across publishing conventions, languages and written scripts, and are robust to different formats of graphical content, e.g., vector/raster graphics

    An Algorithm for Merging and Aligning Ontologies: : Automation and Tool Support

    No full text
    As researchers in the ontology-design field develop the content of a growing number of ontologies, the need for sharing and reusing this body of knowledge becomes increasingly critical. Aligning and merging existing ontologies, which is usually handled manually, often constitutes a large and tedious portion of the sharing process. We have developed SMART, an algorithm that provides a semi-automatic approach to ontology merging and alignment. SMART assists the ontology developer by performing certain tasks automatically and by guiding the developer to other tasks for which his intervention is required. SMART also determines possible inconsistencies in the state of the ontology that may result from the user’s actions, and suggests ways to remedy these inconsistencies
    corecore